The data is from the Amazon’s apparel review data.(For more detailed information: chuck@emadri.com)
I picked 1000 observations from the full dataset. Then extract adjetives from the “review_body”. And judge the category of the product from the “product_tittle”. The final dataset contains following variables:
-product_id: ID of the product being reviewed.
-review_id: The ID of reviewer.
-attributes: Describes the attributes of the variable “value”
-value: Words extraced from the customer review.
-count: How many time does a single word appears in a single review massage.
-tf: Term frequency.
-weight: How rare is the word.
-star_rating: The rating for the product, given by the reviewer.
-item_name: Categories of products.
-category: The category decided by me, for the covience of following analysis. Has 4 values: access(Accessories), top, bot and under(Underwear) (For more info: andy@emadri.com) .
## product_id review_id attributes value count tf
## 306 B013CTPBPE R3N8ZXJENRDD1I adj smalldisappointed 1 1.0000000
## 307 B013CTPSJS R3VU0L98WUG5C2 adj nice 1 0.3333333
## 308 B013CTPSJS R3VU0L98WUG5C2 comfort comfortable 1 0.3333333
## 309 B013CTPSJS R3VU0L98WUG5C2 adj darker 1 0.3333333
## 434 B013CUFO5K R3E1ZZ2VDGY74P adj great 1 1.0000000
## 435 B013CV7H0O R1JMX9BBKAD6OB adj rough 1 0.0400000
## 436 B013CV7H0O R1JMX9BBKAD6OB adj top 1 0.0400000
## 437 B013CV7H0O R1JMX9BBKAD6OB adj concerned 1 0.0400000
## 438 B013CV7H0O R1JMX9BBKAD6OB adj small 1 0.0400000
## 439 B013CV7H0O R1JMX9BBKAD6OB adj long 1 0.0400000
## weight star_rating item_name category
## 306 6.82437367 1 cardigan top
## 307 0.58515649 5 cardigan top
## 308 0.64719058 5 cardigan top
## 309 2.04374216 5 cardigan top
## 434 1.31498533 5 cap access
## 435 0.21752317 4 shirts top
## 436 0.10599946 4 shirts top
## 437 0.21752317 4 shirts top
## 438 0.08147528 4 shirts top
## 439 0.11055723 4 shirts top
The purpose of this first step analysis, is to find a set of keywords that are associated with possitive review. Then, to see if these key words could show some insight about how could a customer satisfy with his/her purchase.
Here, I divide extracted into two groups: possitive and negative. Posstive words are from reviews rated with more than 3 stars. Negative words are from review rated with less than 3 stars.
In the following plots, words with larger circle means they are more frequently appeared in the review.
In general, there are 3 perspectives that affact the review of a product:
Example words: unique, beautiful, perfect, great, excellent, good, hot, nice, cute, awesome, fashionable, chic
Example words: not heavy, soft, comfortable, adjustable, not too tight, not too large, stretchy, comfy, breathable
Example words: unbiased(description on the size/color of the product), honest, real, happy, promotional